22 research outputs found

    Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation

    Full text link
    A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.Comment: 6 pages, 3 figures, To appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 - Jul 14, 2017, Hong Kong, Hong Kon

    The Effects of Yoga on College Students\u27 Mental Health: A Systematic Review

    Get PDF
    The mental health of college students is an increasingly serious public health problem. Effective and healthy interventions are needed. More and more research has been conducted on yoga, but there are few randomized controlled trials (RTC) on effects of yoga intervention on students\u27 mental health. Therefore, this study examined effects of quality of yoga intervention on mental health in college students. We used PubMed (Medline), Cochrane, Web of Science, CNKI, VIP Chinese Science and Technology Journal Database (VIP) and WanFang Database to search randomized controlled trials (RCTs) of yoga intervention in college students\u27 mental health. After the screening, 17 articles met the requirements and were included along with the utilization of the Cochrane bias risk assessment tool Rob2.0 to evaluate the quality of the included articles. Of 17 articles reviewed, three articles were rated as low risk of bias , five articles were rated as possibly at risk of bias , and nine articles were rated as high risk of bias . The 17 studies predominantly consist of low methodological quality and lack multi-centered, large-sample collaborative research. Almost all researchers mentioned the use of randomization in their articles, but they did not indicate which randomization method was used. There was no description of allocation concealment, blinding, case shedding, case follow-up etc., and it was impossible to judge whether the trial design was correct, or whether random grouping was, indeed, undertaken. This study found that most of the so-called randomized controlled trials are doubtful, which virtually reduces the strength and credibility of this study. Therefore, improving the research quality of yoga intervention and standardizing the writing of scientific research articles are the problems need to be solved in the current field of sports psychology research in China. Current evidence shows that yoga exercise can relax the body and mind, thereby improving the level of mental health, and complete yoga (exercise, breathing, meditation) significantly relieves the symptoms of depression. Performing yoga postures and exercises promotes blood circulation, effectively improves sleep, and regulates breathing to stabilize autonomic nerves, relieve stress, and eliminate mental tension. In the future, yoga practice can be used as a non-medical intervention to treat mental illness. The quality of the current randomized controlled trials of yoga intervention in the mental health of college students is generally low. Randomized controlled trials with reasonable methodological design, strict implementation, and sufficient follow-up time are still needed. It is recommended that researchers should strengthen the systematic study of clinical trial methodology and strictly refer to the Cochrane manual list for clinical research reports in order to improve the quality of literature reports

    ModelScope Text-to-Video Technical Report

    Full text link
    This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i.e., Stable Diffusion). ModelScopeT2V incorporates spatio-temporal blocks to ensure consistent frame generation and smooth movement transitions. The model could adapt to varying frame numbers during training and inference, rendering it suitable for both image-text and video-text datasets. ModelScopeT2V brings together three components (i.e., VQGAN, a text encoder, and a denoising UNet), totally comprising 1.7 billion parameters, in which 0.5 billion parameters are dedicated to temporal capabilities. The model demonstrates superior performance over state-of-the-art methods across three evaluation metrics. The code and an online demo are available at \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary}.Comment: Technical report. Project page: \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary

    Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition

    Full text link
    Open-set action recognition is to reject unknown human action cases which are out of the distribution of the training set. Existing methods mainly focus on learning better uncertainty scores but dismiss the importance of feature representations. We find that features with richer semantic diversity can significantly improve the open-set performance under the same uncertainty scores. In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance. To this end, a novel Prototypical Similarity Learning (PSL) framework is proposed to keep the instance variance within the same class to retain more IS information. Besides, we notice that unknown samples sharing similar appearances to known samples are easily misclassified as known classes. To alleviate this issue, video shuffling is further introduced in our PSL to learn distinct temporal information between original and shuffled samples, which we find enlarges the CS information. Extensive experiments demonstrate that the proposed PSL can significantly boost both the open-set and closed-set performance and achieves state-of-the-art results on multiple benchmarks. Code is available at https://github.com/Jun-CEN/PSL.Comment: To appear at CVPR202

    CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

    Full text link
    2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusion methods cannot explicitly make full use of the 2D information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work. Our method has two contributions. First, our bidirectional fusion scheme explicitly and implicitly enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively, which surpasses either one of the single fusion schemes. Second, we distillate the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D knowledge branch) so that the 3D network can generate 2D information even for those points not in the FOV (field of view) of the camera. In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input. We show that our CMDFusion achieves the best performance among all fusion-based methods on SemanticKITTI and nuScenes datasets. The code will be released at https://github.com/Jun-CEN/CMDFusion

    Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification

    Full text link
    Recent advances in large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates the performance of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical tasks beyond question-answering. Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples as proxies for two fundamental tasks in the clinical domain - classification and reasoning. The first task is classifying whether statements of clinical and policy recommendations in scientific literature constitute health advice. The second task is causal relation detection from the biomedical literature. We compared LLMs with simpler models, such as bag-of-words (BoW) with logistic regression, and fine-tuned BioBERT models. Despite the excitement around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks remained the best strategy. The simple BoW model performed on par with the most complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie

    VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

    Full text link
    A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-support text-conditioned video creation.Comment: Accepted to CVPR202

    An intensity-enhanced method for handling mobile laser scanning point clouds

    No full text
    Currently, mobile laser scanning (MLS) systems can conveniently and rapidly measure the backscattered laser beam properties of the object surfaces in large-scale roadway scenes. Such properties is digitalized as the intensity value stored in the acquired point cloud data, and the intensity as an important information source has been widely used in a variety of applications, including road marking inventory, manhole cover detection, and pavement inspection. However, the collected intensity is often deviated from the object reflectance due to two main factors, i.e. different scanning distances and worn-out surfaces. Therefore, in this paper, we present a new intensity-enhanced method to gradually and efficiently achieve the intensity enhancement in the MLS point clouds. Concretely, to eliminate the intensity inconsistency caused by different scanning distances, the direct relationship between scanning distance and intensity value is modeled to correct the inconsistent intensity. To handle the low contrast between 3D points with different intensities, we proposed to introduce and adapt the dark channel prior for adaptively transforming the intensity information in point cloud scenes. To remove the isolated intensity noises, multiple filters are integrated to achieve the denoising in the regions with different point densities. The evaluations of our proposed method are conducted on four MLS datasets, which are acquired at different road scenarios with different MLS systems. Extensive experiments and discussions demonstrate that the proposed method can exhibit the remarkable performance on enhancing the intensities in MLS point clouds

    Measuring Pointwise V-Usable Information In-Context-ly

    No full text
    corecore